最近的图像识别进展刺激了以前所未有的规模部署视觉系统。因此,目前的数据通常不仅由人类而且由机器消耗。现有的图像处理方法仅优化以获得更好的人类感知,但是可以通过机器准确地识别所得到的图像。这可以是不希望的,例如,可以通过搜索引擎或推荐系统来处理图像。在这项工作中,我们研究了简单的方法来提高处理图像的机器识别:直接在图像处理网络上或通过中间变换模型优化识别损耗。有趣的是,加工模型加强识别质量的能力可以在评估不同架构,公认的类别,任务和训练数据集的模型时传输。这使得即使在我们没有未来识别模型的知识,例如,如果我们将被处理的图像上传到Internet的情况,也使得这些方法适用。我们对多种图像处理任务进行实验,用Imagenet分类和Pascal VOC检测作为识别任务。利用这些简单但有效的方法,可以通过强大的可转移性和最小的图像质量损失来实现大量的精度增益。通过用户学习,我们进一步表明精度增益可以转移到黑盒云模型。最后,我们试图通过展示不同模型决策边界的相似之处来解释这种可转移性现象。代码可在https://github.com/liuzhuang13/transferable_ra获得。
translated by 谷歌翻译
Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the "Lottery Ticket Hypothesis" (Frankle & Carbin, 2019), and find that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization. * Equal contribution. † Work done while visiting UC Berkeley.
translated by 谷歌翻译
Figure 1: "Do as I Do" motion transfer: given a YouTube clip of a ballerina (top), and a video of a graduate student performing various motions, our method transfers the ballerina's performance onto the student (bottom).
translated by 谷歌翻译
随着电子技术开发和生产技术的改进,工业机器人为社会服务和工业生产提供了优势。但是,由于长期机械磨损和结构变形,绝对定位精度较低,这极大地阻碍了制造业的发展。校准机器人的运动学参数是解决该机器人的有效方法。但是,主要的测量设备(例如激光跟踪器和坐标测量机)很昂贵,需要特殊人员才能操作。此外,在测量过程中,由于许多环境因素的影响,产生了测量噪声,这将影响机器人的校准精度。在这些基础上,我们完成了以下工作:a)基于平面约束的机器人校准方法,以简化测量步骤; b)采用平方根培养Kalman滤波器(SCKF)算法来减少测量噪声的影响; c)提出了一种新型算法,用于鉴定基于SCKF算法和Levenberg Marquardt(LM)算法的运动学参数以达到高校准精度; d)采用拨号指示器作为削减成本的测量设备。足够的实验验证了所提出的校准算法和实验平台的有效性。
translated by 谷歌翻译
提出了一种自动编码器(AE)极限学习机(ELM)-AE-ELM模型,以根据相互信息算法(MI),AE和ELM的组合来预测NOX发射浓度。首先,实用变量的重要性由MI算法计算,并分析了该机制以确定与NOX发射浓度相关的变量。然后,进一步分析了所选变量与NOX发射浓度之间的时间延迟相关性,以重建建模数据。随后,将AE应用于输入变量中的隐藏特征。最后,ELM算法建立了NOX发射浓度与深度特征之间的关系。实用数据的实验结果表明,与最先进的模型相比,提出的模型显示出有希望的性能。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality clustering result and extract topics from each cluster simultaneously. Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction, where the enhanced language model can be viewed as a bridge between clustering and topic extraction. On one hand, it provides text embeddings with a strong cluster structure which facilitates effective text clustering; on the other hand, it pays high attention on the topic related words for topic extraction because of its self-attention architecture. Moreover, the training of enhanced language model is unsupervised. Experiments on two datasets demonstrate the effectiveness of our framework and provide benchmarks for different model combinations in this framework.
translated by 谷歌翻译
An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译